A Framework for General Sparse Matrix-Matrix Multiplication on GPUs and Heterogeneous Processors
نویسندگان
چکیده
General sparse matrix-matrix multiplication (SpGEMM) is a fundamental building block for numerous applications such as algebraic multigrid method (AMG), breadth first search and shortest path problem. Compared to other sparse BLAS routines, an efficient parallel SpGEMM implementation has to handle extra irregularity from three aspects: (1) the number of nonzero entries in the resulting sparse matrix is unknown in advance, (2) very expensive parallel insert operations at random positions in the resulting sparse matrix dominate the execution time, and (3) load balancing must account for sparse data in both input matrices. In this work we propose a framework for SpGEMM on GPUs and emerging CPU-GPU heterogeneous processors. This framework particularly focuses on the above three problems. Memory pre-allocation for the resulting matrix is organized by a hybrid method that saves a large amount of global memory space and efficiently utilizes the very limited on-chip scratchpad memory. Parallel insert operations of the nonzero entries are implemented through the GPU merge path algorithm that is experimentally found to be the fastest GPU merge approach. Load balancing builds on the number of necessary arithmetic operations on the nonzero entries and is guaranteed in all stages. Compared with the state-of-the-art CPU and GPU SpGEMM methods, our approach delivers excellent absolute performance and relative speedups on various benchmarks multiplying matrices with diverse sparsity structures. Furthermore, on heterogeneous processors, our SpGEMM approach achieves higher throughput by using re-allocatable shared virtual memory.
منابع مشابه
Investigating the Effects of Hardware Parameters on Power Consumptions in SPMV Algorithms on Graphics Processing Units (GPUs)
Although Sparse matrix-vector multiplication (SPMVs) algorithms are simple, they include important parts of Linear Algebra algorithms in Mathematics and Physics areas. As these algorithms can be run in parallel, Graphics Processing Units (GPUs) has been considered as one of the best candidates to run these algorithms. In the recent years, power consumption has been considered as one of the metr...
متن کاملGeneral-Purpose Sparse Matrix Building Blocks using the NVIDIA CUDA Technology Platform
We report on our experience with integrating and using graphics processing units (GPUs) as fast parallel floatingpoint co-processors to accelerate two fundamental computational scientific kernels on the GPU: sparse direct factorization and nonlinear interior-point optimization. Since a full re-implementation of these complex kernels is typically not feasible, we identify e.g. the matrix-matrix ...
متن کاملA New Sparse Matrix Vector Multiplication GPU Algorithm Designed for Finite Element Problems
Recently, graphics processors (GPUs) have been increasingly leveraged in a variety of scientific computing applications. However, architectural differences between CPUs and GPUs necessitate the development of algorithms that take advantage of GPU hardware. As sparse matrix vector multiplication (SPMV) operations are commonly used in finite element analysis, a new SPMV algorithm and several vari...
متن کاملAlgorithmic performance studies on graphics processing units
We report on our experience with integrating and using graphics processing units (GPUs) as fast parallel floatingpoint co-processors to accelerate two fundamental computational scientific kernels on the GPU: sparse direct factorization and nonlinear interior-point optimization. Since a full re-implementation of these complex kernels is typically not feasible, we identify the matrix-matrix multi...
متن کاملSpeculative segmented sum for sparse matrix-vector multiplication on heterogeneous processors
Sparse matrix-vector multiplication (SpMV) is a central building block for scientific software and graph applications. Recently, heterogeneous processors composed of different types of cores attracted much attention because of their flexible core configuration and high energy efficiency. In this paper, we propose a compressed sparse row (CSR) format based SpMV algorithm utilizing both types of ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- J. Parallel Distrib. Comput.
دوره 85 شماره
صفحات -
تاریخ انتشار 2015